Functional Programming in R with purrr

CCHMC R Users Group

Cole Brokamp

2/8/22

Welcome

Join the RUG Outlook group for updates and events. {width=180%}

Which core tidyverse package would you like to learn about in a future RUG meeting?

Functional Programming in R with purrr

1. Functional programming

2. purrr and map() functions

3. purrr and base R alternatives

4. More purrr operations and extensions

Functional programming

Object-oriented and functional programming

R is a functional programming (FP) language, but has several object oriented programming (OOP) systems built on top

  • OOP: have a fixed set of operations on things; primarily adding new things by adding new classes which implement existing methods
  • FP: have a fixed set of things using operations; primarily adding new operations on existing things

Reducing duplication in code

  • easier to see intent and understand (less code)
  • easier to change (less code to change)
  • easier to find bugs (code used more)

Imperitive and functional programming

Imperitive: deduplicating repetitive code through iteration

Functional: abstracting away common code into smaller chunks

Iterating using for and while loops

  1. output: allocate space for output
  2. sequence: what to loop over
  3. body: does the work
d <-
  tibble::tibble(a = rnorm(20),
                 b = rnorm(20),
                 c = rnorm(20))

output <- vector("double", ncol(d))  # 1. output
for (i in seq_along(d)) {            # 2. sequence
  output[[i]] <- median(d[[i]])      # 3. body
}
output
[1]  0.4385698 -0.5059396  0.0746266

Iteration vs. functions

  • Loops use bookkeeping code that can be verbose and make it harder to understand the intent of the code
  • Functions extract duplicated code, including for loops, and can be called directly
  • Pass functions as arguments to other functions (!) to call indirectly

Functional approach

Break down common list manipulation challenges into smaller pieces. Solve for one element, then use functional tools to apply to all elements.

Decompose complex problems into small, stepwise pieces.

Increase code speed and clarity with respect to reading, writing, and understanding code.

purrr and map() functions

purrr

map

Loop over a vector, do something to each element, and save the results.

map example

Apply a function to each column in a data.frame:

library(tibble)
library(purrr)

d <-
  tibble(
    a = rnorm(20),
    b = rnorm(20),
    c = rnorm(20)
  )

map(d, median)
$a
[1] -0.1776651

$b
[1] -0.1574246

$c
[1] -0.3309996

map details

  • Preserves length and names of input
  • ... to pass along additional arguments to .f each time it’s called
  • Implemented in C
  • Strict type outputs

map types

function type
map() list
map_lgl() logical
map_int() integer
map_dbl() double
map_chr() character
map_dbl(d, median)
         a          b          c 
-0.1776651 -0.1574246 -0.3309996 

 

map_lgl(d, median)
Error in `map_lgl()`:
ℹ In index: 1.
Caused by error:
! Can't coerce from a double vector to a logical vector.

Using lists of data

split() is from base R and divides a vector into groups defined by a ‘factor’:

mtcars |>
  split(mtcars$cyl)
$`4`
                mpg cyl  disp hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0 93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7 62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8 95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7 66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7 52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1 65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1 97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0 66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3 91 4.43 2.140 16.70  0  1    5    2
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

$`6`
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

$`8`
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

Anonymous functions and named components

mtcars |>
  split(mtcars$cyl) |>
  map(~ lm(mpg ~ wt, data = .)) |>
  map(summary) |>
  map_dbl("r.squared")
        4         6         8 
0.5086326 0.4645102 0.4229655 

purrr and base R alternatives

lapply() and sapply()

Advantages of using purrr over equivalents in base R

  • First argument is always data; works with pipe
  • Type-stable
  • All map() functions accept functions (named, anonymous, lambda), character vector (extract by name), numeric vectors (extract by position)

More purrr operations and extensions

Dealing with failure

  • safely() + list_transpose() to get a list of all results and a list of all errors
  • possibly(), quietly()

map over multiple arguments

  • map2(): refer to variables with .x and .y
  • pmap(): refer to variables with ..1, ..2, and ..3
    • named arguments for use in pmap (store in a tibble or a list)
  • walk(), walk2(), pwalk() called for side effects

map variants

  • Map conditionally: map_if(), map_at()
  • Map/modify elements at given depth: map_depth(), modify_depth()
  • Modify elements selectively: modify(), modify_if(), modify_at()

Plucking

Get, set, or remove a single element

  • pluck(), chuck()
  • modify_in(), assign_in()

Predicate functionals

  • keep(), discard(), compact()
  • keep_at(), discard_at()
  • some(), every(), none()
  • detect(), detect_index()
  • head_while(), tail_while()

Monitor map code with progress bars from cli

walk(1:3, Sys.sleep, .progress = TRUE)

dplyr and list columns

Use {dplyr} to create a data list-column and add new columns:

library(dplyr)

mtcars |>
  nest_by(cyl)
# A tibble: 3 × 2
# Rowwise:  cyl
    cyl                data
  <dbl> <list<tibble[,10]>>
1     4           [11 × 10]
2     6            [7 × 10]
3     8           [14 × 10]

dplyr::mutate() using map

mtcars |>
  group_by(cyl) |>
  tidyr::nest() |>
  mutate(model = map(data, ~ lm(mpg ~ wt, data = .)),
         summary = map(model, summary),
         rsq = map_dbl(summary, "r.squared"))
# A tibble: 3 × 5
# Groups:   cyl [3]
    cyl data               model  summary      rsq
  <dbl> <list>             <list> <list>     <dbl>
1     6 <tibble [7 × 10]>  <lm>   <smmry.lm> 0.465
2     4 <tibble [11 × 10]> <lm>   <smmry.lm> 0.509
3     8 <tibble [14 × 10]> <lm>   <smmry.lm> 0.423

dplyr::nest_by()

mtcars |> 
  nest_by(cyl) |>
  mutate(model = list(lm(mpg ~ wt, data = data)),
         summary = list(summary(model)),
         rsq = summary$r.squared)
# A tibble: 3 × 5
# Rowwise:  cyl
    cyl                data model  summary      rsq
  <dbl> <list<tibble[,10]>> <list> <list>     <dbl>
1     4           [11 × 10] <lm>   <smmry.lm> 0.509
2     6            [7 × 10] <lm>   <smmry.lm> 0.465
3     8           [14 × 10] <lm>   <smmry.lm> 0.423

dplyr::nest_by()

mtcars |> 
  nest_by(cyl, vs) |>
  mutate(model = list(lm(mpg ~ wt, data = data)),
         summary = list(summary(model)),
         rsq = summary$r.squared)
# A tibble: 5 × 6
# Rowwise:  cyl, vs
    cyl    vs               data model  summary       rsq
  <dbl> <dbl> <list<tibble[,9]>> <list> <list>      <dbl>
1     4     0            [1 × 9] <lm>   <smmry.lm> 0     
2     4     1           [10 × 9] <lm>   <smmry.lm> 0.520 
3     6     0            [3 × 9] <lm>   <smmry.lm> 0.0103
4     6     1            [4 × 9] <lm>   <smmry.lm> 0.876 
5     8     0           [14 × 9] <lm>   <smmry.lm> 0.423 

Extensions

Thank You

🌐 https://colebrokamp.com

👨‍💻️ github.com/cole-brokamp

🐦 @cole_brokamp

📧 cole.brokamp@cchmc.org